Conversation
…vity-check Add community activity tracking for reports
…e-instruction-column Add outdated upgrade instructions to reports
Rollback
…ackages-0487ic Add script to analyze outdated packages
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (2)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughIntroduces a new OutdatedPackageAnalysis script and supporting utilities to analyze Python package upgrades, community activity, and vulnerability-free version suggestions. Adds CommunityActivityUtils for activity timelines and a new async helper in VersionSuggester. Adjusts formatting in GenerateReport without logic changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User/CLI
participant OA as OutdatedPackageAnalysis
participant RF as Requirements File
participant PY as PyPI API
participant VS as VersionSuggester (OSV)
participant CA as CommunityActivityUtils (GitHub)
U->>OA: main(args)
OA->>RF: read package==version list
loop per package (async)
OA->>PY: fetch releases/metadata
OA->>VS: find_latest_safe_version_for_major(pkg, curr, all, major)
VS->>VS: filter/sort candidates
VS->>VS: query OSV (semaphore-limited)
OA->>CA: get_activity_dates(pkg, curr, pypi_info)
CA->>PY: (optional) use pre-fetched PyPI info
CA->>CA: resolve GitHub repo
CA->>GitHub: GraphQL/REST (cached, token)
CA-->>OA: last-active dates
end
OA-->>U: CSV report path
sequenceDiagram
autonumber
participant CA as CommunityActivityUtils
participant PY as PyPI API
participant GH as GitHub API
CA->>PY: Get package info
CA->>CA: Extract repo URL
alt GraphQL available
CA->>GH: GraphQL last activity
else REST fallback
CA->>GH: REST last activity (ETag/If-Modified-Since)
end
CA->>CA: Compute latest dates (current major, overall)
CA-->>Caller: ("YYYY-MM-DD" | "Unknown", "YYYY-MM-DD" | "Unknown")
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
utils/VersionSuggester.py (1)
45-76: Guard against accidental use of legacy suggest_upgrade_version.Per prior learning, this was deprecated in favor of suggest_safe_minor_upgrade. Ensure it’s not referenced.
Run:
#!/bin/bash rg -nP '\bsuggest_upgrade_version\s*\(' -g '!**/dist/**' -g '!**/build/**'utils/CommunityActivityUtils.py (2)
320-349: Re: prior “Incomplete URL substring sanitization” alert — normalization now anchors on hostname._normalize_github_url uses urlparse.hostname and explicit allowlist for github.com/www.github.com, mitigating substring tricks.
414-419: Incorrect GitHub GraphQL endpoint (uses github.com instead of api.github.com).Current code calls https://github.com/graphql which will fail; should be https://api.github.com/graphql.
Apply this diff:
- gql_url = f"{_GITHUB_API.replace('api.', '')}/graphql" + gql_url = f"{_GITHUB_API}/graphql"
🧹 Nitpick comments (9)
utils/VersionSuggester.py (3)
195-207: Narrow exception handling around OSV checks.Catching Exception hides real failures and trips BLE001. Limit to aiohttp/timeout errors.
Apply this diff:
- async with aiohttp.ClientSession() as session: - sem = asyncio.Semaphore(5) - for _, ver_str in candidates: - try: - _, status, _ = await fetch_osv(session, pkg, ver_str, sem) - except Exception as exc: # pragma: no cover - network safety - logger.warning( - f"Failed to verify vulnerabilities for {pkg}=={ver_str}: {exc}" - ) - continue + async with aiohttp.ClientSession() as session: + sem = asyncio.Semaphore(5) + for _, ver_str in candidates: + try: + _, status, _ = await fetch_osv(session, pkg, ver_str, sem) + except (aiohttp.ClientError, asyncio.TimeoutError) as exc: # pragma: no cover + logger.warning("OSV check failed for %s==%s: %s", pkg, ver_str, exc) + continue
136-141: Allow caller-provided session/semaphore to avoid per-call client creation.This reduces overhead when scanning many packages (used by OutdatedPackageAnalysis).
Apply this diff:
-async def find_latest_safe_version_for_major( +async def find_latest_safe_version_for_major( pkg: str, current_version: str, all_versions: list[str], target_major: int, -) -> str | None: + session: aiohttp.ClientSession | None = None, + sem: asyncio.Semaphore | None = None, +) -> str | None: @@ - async with aiohttp.ClientSession() as session: - sem = asyncio.Semaphore(5) - for _, ver_str in candidates: + own_session = session is None + if sem is None: + sem = asyncio.Semaphore(5) + if own_session: + session = aiohttp.ClientSession() + try: + for _, ver_str in candidates: try: _, status, _ = await fetch_osv(session, pkg, ver_str, sem) except (aiohttp.ClientError, asyncio.TimeoutError) as exc: # pragma: no cover - network safety logger.warning( f"Failed to verify vulnerabilities for {pkg}=={ver_str}: {exc}" ) continue @@ - return None + finally: + if own_session and session: + await session.close() + return NoneAlso applies to: 195-209
110-127: Optional: check candidates concurrently (bounded) for faster results.You can fire a small batch (e.g., top 8) with asyncio.as_completed under the same semaphore; stop on first “No”.
I can provide a concrete patch if you’d like this optimization.
utils/CommunityActivityUtils.py (4)
61-63: Narrow cache init exception and log why cache is ignored.Avoid bare Exception and silent fallback.
Apply this diff:
-try: - _ETAG_CACHE = json.loads(_CACHE_PATH.read_text(encoding="utf-8")) -except Exception: - _ETAG_CACHE = {} # {key: {"etag": "...", "last_modified": "...", "payload": {...}, "fetched_at": "..."}} +try: + _ETAG_CACHE = json.loads(_CACHE_PATH.read_text(encoding="utf-8")) +except (OSError, json.JSONDecodeError) as exc: + logger.debug("Initializing GitHub cache failed: %s", exc) + _ETAG_CACHE = {} # {key: {"etag": "...", "last_modified": "...", "payload": {...}, "fetched_at": "..."}}
83-86: Do not swallow cache write failures.Log at debug to aid diagnosability.
Apply this diff:
- try: - _CACHE_PATH.write_text(json.dumps(_ETAG_CACHE), encoding="utf-8") - except Exception: - pass + try: + _CACHE_PATH.write_text(json.dumps(_ETAG_CACHE), encoding="utf-8") + except OSError as exc: + logger.debug("Writing GitHub cache failed: %s", exc)
285-295: Overbroad try/except in host check.urlparse won’t raise here; remove blanket except to avoid masking bugs.
Apply this diff:
-def _is_github_host(url: str) -> bool: - try: - parsed = urlparse(url) - hostname = parsed.hostname - if not hostname: - return False - hostname = hostname.lower() - return hostname == "github.com" or hostname.endswith(".github.com") - except Exception: - return False +def _is_github_host(url: str) -> bool: + parsed = urlparse(url) + hostname = (parsed.hostname or "").lower() + return hostname == "github.com" or hostname.endswith(".github.com")
450-457: Minor: rename unused loop variable to underscore to satisfy linters.No behavior change.
Apply this diff:
- for attempt in range(max_retries): + for _attempt in range(max_retries): @@ - for attempt in range(max_retries): + for _attempt in range(max_retries):Also applies to: 513-520
OutdatedPackageAnalysis.py (2)
268-281: Process packages concurrently with a bounded semaphore.Significantly reduces wall-clock time while respecting external rate limits.
Apply this diff:
-async def _generate_reports( - packages: list[tuple[str, str]], -) -> list[PackageReport]: - """Process packages sequentially and collect report rows.""" - - results: list[PackageReport] = [] - total = len(packages) - for idx, (name, version_str) in enumerate(packages, start=1): - logger.info("[%d/%d] Evaluating %s==%s", idx, total, name, version_str) - report = await _process_package(name, version_str) - if report: - results.append(report) - return results +async def _generate_reports( + packages: list[tuple[str, str]], +) -> list[PackageReport]: + """Process packages concurrently with a small cap.""" + sem = asyncio.Semaphore(5) + total = len(packages) + + async def run_one(idx: int, name: str, version_str: str) -> PackageReport | None: + async with sem: + logger.info("[%d/%d] Evaluating %s==%s", idx, total, name, version_str) + return await _process_package(name, version_str) + + tasks = [run_one(i, n, v) for i, (n, v) in enumerate(packages, start=1)] + results = await asyncio.gather(*tasks) + return [r for r in results if r]
286-294: Clarify CSV header label.Header says “Is Major/Second Major Version” but code may emit “Current Major”. Make the header explicit.
Apply this diff:
- "Is Major/Second Major Version", + "Target Major (Latest/Second/Current)",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
MonthlyReport/2025-09/MonthlyReport-202509-16-1749.xlsxis excluded by!**/*.xlsxWeeklyReport/2025-09-15/WeeklyReport_20250916_173749.csvis excluded by!**/*.csv
📒 Files selected for processing (4)
GenerateReport.py(1 hunks)OutdatedPackageAnalysis.py(1 hunks)utils/CommunityActivityUtils.py(1 hunks)utils/VersionSuggester.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-24T03:17:27.150Z
Learnt from: TongWu
PR: TongWu/PythonPackageManager#14
File: utils/VersionSuggester.py:136-137
Timestamp: 2025-06-24T03:17:27.150Z
Learning: In utils/VersionSuggester.py, the suggest_upgrade_version function has been intentionally disabled as it's outdated. The preferred approach is to use suggest_safe_minor_upgrade function instead for version suggestions.
Applied to files:
utils/VersionSuggester.py
🧬 Code graph analysis (3)
utils/CommunityActivityUtils.py (1)
utils/PyPiUtils.py (1)
GetPyPiInfo(27-44)
utils/VersionSuggester.py (1)
utils/VulnChecker.py (1)
fetch_osv(30-102)
OutdatedPackageAnalysis.py (3)
utils/CommunityActivityUtils.py (1)
get_activity_dates(113-140)utils/PyPiUtils.py (1)
GetPyPiInfo(27-44)utils/VersionSuggester.py (1)
find_latest_safe_version_for_major(136-209)
🪛 Ruff (0.12.2)
utils/CommunityActivityUtils.py
62-62: Do not catch blind exception: Exception
(BLE001)
85-86: try-except-pass detected, consider logging the exception
(S110)
85-85: Do not catch blind exception: Exception
(BLE001)
293-293: Do not catch blind exception: Exception
(BLE001)
450-450: Loop control variable attempt not used within loop body
Rename unused attempt to _attempt
(B007)
455-455: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
474-474: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
478-478: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
513-513: Loop control variable attempt not used within loop body
Rename unused attempt to _attempt
(B007)
518-518: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
532-532: Do not catch blind exception: Exception
(BLE001)
utils/VersionSuggester.py
200-200: Do not catch blind exception: Exception
(BLE001)
OutdatedPackageAnalysis.py
1-1: Shebang is present but file is not executable
(EXE001)
312-312: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
GenerateReport.py (1)
320-322: Formatting-only change — LGTM.No logic impact detected on monthly_df columns.
| async def _process_package( | ||
| package: str, | ||
| current_version_str: str, | ||
| ) -> PackageReport | None: | ||
| """Inspect a single package and generate a report entry when outdated.""" | ||
|
|
||
| info = GetPyPiInfo(package) | ||
| if not info: | ||
| logger.warning("PyPI metadata unavailable for %s; skipping", package) | ||
| return None | ||
|
|
There was a problem hiding this comment.
Blocking requests inside async path — move to threads.
GetPyPiInfo and get_activity_dates use requests; calling them directly in async blocks the event loop.
Apply this diff:
- info = GetPyPiInfo(package)
+ info = await asyncio.to_thread(GetPyPiInfo, package)
@@
- last_active_current_major, last_active_package = get_activity_dates(
- package, current_version_str, info
- )
+ last_active_current_major, last_active_package = await asyncio.to_thread(
+ get_activity_dates, package, current_version_str, info
+ )Also applies to: 253-255
🤖 Prompt for AI Agents
In OutdatedPackageAnalysis.py around lines 211-221 (and similarly at 253-255),
the synchronous HTTP calls (GetPyPiInfo and get_activity_dates) are being
invoked directly inside an async function which blocks the event loop; change
those calls to run on a thread by using asyncio.to_thread (or
loop.run_in_executor) and await the result (e.g., info = await
asyncio.to_thread(GetPyPiInfo, package) and activity = await
asyncio.to_thread(get_activity_dates, ...)), update imports if needed (import
asyncio), and ensure error handling/logging remains the same after awaiting the
threaded call.
Summary by CodeRabbit